options(scipen = 999)
Building Databases
Scientific Notation
When you first get R and R Studio set up, it may be using scientific notation to express larger numbers. So you’ll see numbers like this \[ 5.234e+10 \] This is a type of exponent, which is in a scientific notation format. Here’s a simpler example to understand what this means. Let’s start with a number like 28.
In scientific notation this would look like \[ 2.8e+01 \] Or in an exponential form more familiar \[ 2.8x10^1 \] So it’s 2.8 times 10 to the first power. 280 Would look like this \[ 2.8e+02/ or/ 2.8x10^2 \] It’s basically an easier way to represent larger numbers like 280 million (280,000,000) \[ 2.8e+08/ or/ 2.8x10^8 \] To turn this off this setting do the following
If you want to turn it back on, do this
options(scipen = 0)
More work on databases
Let’s make up a database based on covid figures from the New York Times.
- Create your objects
- Make sure to use quotations for objects that are names or titles (Remember these are categorical variables)
<- c("United States", "India", "Brazil", "Russia", "UK") Countries
<- c(24249722, 10581837,8511770,3574330, 3466849) Total_Cases
<- c(400810, 152556, 210299,65632,91470) Total_Deaths
Then you can use the data.frame command to put them all together
<- data.frame(Countries, Total_Cases, Total_Deaths) Covid
You could actually do all these steps at the same time
<- data.frame(Countries = c("United States", "India",
Covid_Again "Brazil", "Russia", "UK"),
Total_Cases = c(24249722, 10581837,8511770,
3574330, 3466849),
Total_Deaths = c(400810, 152556, 210299,65632,91470))
Another nice way to make a dataset is by using a tibble
This is part of the tidyverse package and simplifies the code somewhat. Notice that the command to make a tibble is actually tribble.
<- tribble(
Covid_TR ~Countries, ~Total_Cases, ~Total_Deaths,
"United States", 24249722, 400810,
"India", 10581837, 152556,
"Brazil", 8511770, 210299,
"Russia", 3574330, 65632,
"UK", 3466849, 91470
)
A tibble is nice because it sets it up more like a spreadsheet.
Notice that the ~ specifies the columns or variables and then the rest are like rows.
Manipulate Data
Mortality rate is total deaths divided by the total number of cases. You can use R to calculate this for you and then create the object.
<- c(Total_Deaths/Total_Cases) Mortality_Rate
Then we can add all four variables together to remake our covid data.frame
<- data.frame(Countries,Total_Cases,Total_Deaths,Mortality_Rate) Covid
Tidyverse supplies some other helps here if we are using tibbles.
We can use mutate to add in the other variable based on a computation.
<- mutate(Covid_TR, Mortality_Rate = Total_Deaths/Total_Cases) Covid_TR
We can use rename to change the name of our variable
<- rename(Covid_TR, Mortality = Mortality_Rate) Covid_TR
You can practice this on your own.